Non-stationary acoustic objects as atoms of voiced speech

نویسندگان

  • Friedhelm R. Drepper
  • Ralf Schlüter
چکیده

In spite of the undisputedly high degree of non-stationarity of speech signals, the present day determination of its acoustic features is based on the assumption that speech production can be described as a linear time invariant (LTI) system on the time scale of about 20 ms [1]. In automatic speech recognition, the wide sense stationarity of an LTI– system is used as prerequisite for the consistent estimation of Fourier spectra or of autoregressive models [1]. As an evolutionarily plausible supplement, speech perception is also assumed to be focussed on acoustic features which are obtained by using the LTI assumption [2]. Present day models of pitch perception are no exception [2, 3]. The empirical mode decomposition of Huang et al. [4] represents one of very few methods, which are suited to analyze signals without assuming their stationarity. In case of voiced speech, the formant frequencies have been shown to correlate well with frequencies of the empirical modes [5]. However, it is hard to imagine how the so called sifting process of Huang et al. [4] fits into the known function of the peripheral auditory pathway. The present study proposes a different method of empirical mode decomposition, which is based on well known features of the auditory pathway. As a further contrast to decomposition [4], the pitch perception oriented mode decomposition leads to mode reconstructions, which can be confirmed to have uncorrupted phases in comparison to the phases of underlying oscillatory subsystems [6, 7]. Furthermore, it is shown that voiced phones support a fast convergent iterative reconstruction. The empirical modes can be used advantageously to reconstruct a single fundamental oscillator, the frequency of which can be interpreted as the acoustic correlate of virtual pitch. Virtual pitch perception should be interpreted as an ingenious instrument of time scale separation, which separates the phonological time or frequency scales from the phonetically relevant ones. In contrast to numerous existing pitch tracking methods [2, 3], the virtual pitch oriented time scale separation does not rely on a frequency gap in the long term spectrum (being equivalent to temporary stationarity).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Non-stationary Self-consistent Acoustic Objects as Atoms of Voiced Speech

Voiced segments of speech are assumed to be composed of non-stationary voiced acoustic objects which are generated as stationary (secondary) response of a non-stationary drive oscillator and which are analysed by introducing a selfconsistent part-tone decomposition. The self-consistency implies that the part-tones (of voiced continuants) are suited to reconstruct a topologically equivalent imag...

متن کامل

Voiced speech as secondary response of a self-consistent fundamental drive

Voiced segments of speech are assumed to be composed of non-stationary acoustic objects which can be described as stationary response of a non-stationary fundamental drive (FD) process and which are furthermore suited to reconstruct the hidden FD by using a voice adapted (self-consistent) parttone decomposition of the speech signal. The universality and robustness of human pitch perception enco...

متن کامل

Voiced speech as response of a self-consistent fundamental drive

Voiced segments of speech are assumed to be composed of non-stationary acoustic objects which can be described as stationary response of a non-stationary fundamental drive (FD) process and which are furthermore suited to reconstruct the hidden FD by using a voice adapted (self-consistent) part-tone decomposition of the speech signal. The universality and robustness of human pitch perception enc...

متن کامل

Voiced excitation as entrained p a reconstructed glottal ma

A time scale separation of voiced speech signals is introduced, which avoids the assumption of a frequency gap between the acoustic response and the prosodic drive. The non-stationary drive is extracted selfconsistently from a voice specific subband decomposition of the speech signal. When the band limited prosodic drive is used as fundamental drive of a two-level drive-response model, the voic...

متن کامل

Non-stationary signal processing and its application in speech recognition

The most widely used acoustic feature extraction methods of current automatic speech recognition (ASR) systems are based on the assumption of stationarity. In this paper we extensively evaluate a recently introduced filter stable, non-stationary signal processing method, which relies on an adaptive parttone decomposition of voiced speech to obtain alternative feature vectors for ASR. The non-st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008